Supervised and unsupervised PCFG adaptation to novel domains
نویسندگان
چکیده
This paper investigates adapting a lexicalized probabilistic context-free grammar (PCFG) to a novel domain, using maximum a posteriori (MAP) estimation. The MAP framework is general enough to include some previous model adaptation approaches, such as corpus mixing in Gildea (2001), for example. Other approaches falling within this framework are more effective. In contrast to the results in Gildea (2001), we show F-measure parsing accuracy gains of as much as 2.5% for high accuracy lexicalized parsing through the use of out-of-domain treebanks, with the largest gains when the amount of indomain data is small. MAP adaptation can also be based on either supervised or unsupervised adaptation data. Even when no in-domain treebank is available, unsupervised techniques provide a substantial accuracy gain over unadapted grammars, as much as nearly 5% F-measure improvement.
منابع مشابه
Deep Unsupervised Domain Adaptation for Image Classification via Low Rank Representation Learning
Domain adaptation is a powerful technique given a wide amount of labeled data from similar attributes in different domains. In real-world applications, there is a huge number of data but almost more of them are unlabeled. It is effective in image classification where it is expensive and time-consuming to obtain adequate label data. We propose a novel method named DALRRL, which consists of deep ...
متن کاملUse of unsupervised word classes for entity recognition: Application to the detection of disorders in clinical reports
Unsupervised word classes induced from unannotated text corpora are increasingly used to help tasks addressed by supervised classification, such as standard named entity detection. This paper studies the contribution of unsupervised word classes to a medical entity detection task with two specific objectives: How do unsupervised word classes compare to available knowledge-based semantic classes...
متن کاملCache-based Dynamic PCFG Adaptation using MAP Estimation
This paper presents a cache-based dynamic adaptation technique for lexicalized probabilistic context-free-grammar (LPCFG). Expected counts from machine-parsed sentences of in-domain data are stored in a cache, which are combined with prior counts from hand-annotated parses of outof-domain data using maximum a posteriori (MAP) estimation. This adaptation is unsupervised, and dynamic with an adap...
متن کاملMAP adaptation of stochastic grammars
This paper investigates supervised and unsupervised adaptation of stochastic grammars, including ngram language models and probabilistic context-free grammars (PCFGs), to a new domain. It is shown that the commonly used approaches of count merging and model interpolation are special cases of a more general maximum a posteriori (MAP) framework, which additionally allows for alternate adaptation ...
متن کاملAdaptive Pattern Recognition to Ensure Clinical Viability over Time
Pattern Recognition is a useful tool for deciphering movement intent from myoelectric signals. In order to be clinically viable over time, recognition paradigms must be capable of adapting with the user. Most existing paradigms are static, although two forms of adaptation have received limited attention: Supervised adaptation achieves high accuracy, since the intended class is known, but at the...
متن کامل